In the past few weeks I've upgraded a couple of Ubuntu 20.04 (Focal) machines to a higher LTS version, either Ubuntu 22.04 (Jammy) or 24.04 (Noble). Aside a few minor upgrade gotchas, every single machine was successfully upgraded.
Every single machine?
No! One single virtual machine, in the vast territory of serverland, refused to operate under the new LTS emperor!
What happened?
After the distribution upgrade process (apt-get dist-upgrade) successfully ran through, it was time to reboot the system. While checking the monitoring, I realized it takes quite some time for the machine to come back online. Too much time.
This is when I connected to the hypervisor and to the virtual machine's console and found this:
The error on the console didn't look good. A Kernel panic showed that the root file system could not be mounted.
Please append a correct "root=" boot option; here are the available partitions
[...]
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
CPU: 1 PID: 1 Comm: swapper/0 Not trainted 5.4.0-88-generic #99-Ubuntu
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Call Trace:
dump_stack+0x6d/0x8b
panic+0x101/0x2e3
mount_block_root+0x23f/0x2e8
mount_root+0x38/0x3a
prepare_namespace+0x13f/0x194
kernel_init_freeable+0x23f/0x263
? rest_init+0xb0/0xb0
kernel_init+0xe/0x110
ret_from_fork+0x35/0x40
Kernel Offset: 0x2ac00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---
It seemed that the newer Kernel (5.4.0-88) wouldn't let the system boot... Even the recovery boot option with that Kernel wouldn't boot correctly.
Luckily the previously used Kernel (4.15.0) was still available in the Grub2 menu. I selected the Recovery Mode with this Kernel version and the (recovery) system successfully booted. I could drop to the root shell and everything seemed to be correctly mounted and working.
If the system would work under this Kernel, all needed modules (e.g. lvm2) are loaded. Maybe during the distribution upgrade the initramfs of the new Kernel was lacking a required module?
So my first thought was to manually repair the initramfs of the broken Kernel (5.4.0) and then install the grub2 bootloader again.
root@recovery:~# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-5.4.0-88-generic
root@recovery:~# update-grub
But after another reboot, the default boot entry with Kernel 5.4.0 still ran into the same Kernel panic!
The machine in question has been around for many years and has done multiple LTS distribution upgrades. Maybe the virtual hardware version was outdated and too old for the newer Ubuntu 22.04 LTS with the newer Kernel?
As it turned out, there was indeed a very old virtual hardware version (VM version 11) active on this VM:
But even after upgrading the virtual hardware (nowadays called VM Compatibility Upgrade) to VM version 19, the machine still did not want to boot.
Maybe I just got really unlucky and the distribution upgrade installed a buggy Kernel version? Maybe an Ubuntu Kernel patch would already be in the pipeline?
When I checked the changelog of this Kernel version, I saw the date (September 2021) and realized something important: WHY Kernel 5.4.0?! Kernel 5.4.0 is not new, there should be a newer Kernel version coming with Ubuntu 22.04!
I did a verification on the linux-image-generic package and, indeed, the initial Kernel version under Ubuntu 22.04 (Jammy) should have been 5.15.0.
Something really strange is going on with this machine. Why wasn't the newer Kernel version installed during the Ubuntu distribution upgrade?
At this moment in time I was really confused and my priority was to get this machine back to operation. I decided to remove the broken Kernel from the system and boot the system normally with the still working (4.15.0) Kernel:
root@recovery:~# apt-get remove linux-image-5.4.0-88-generic
[...]
The following packages will be REMOVED:
linux-image-5.4.0-88-generic linux-modules-extra-5.4.9-88-generic
0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.
After this operation, 214 MB disk space will be freed.
Do you want to continue? [Y/n] y
The manipulation of the Kernel packages automatically triggered another update-initramfs and update-grub. The Grub2 bootloader, at this moment in time, would only have one boot entry remaining: The Ubuntu entry with Kernel 4.15.0.
The system booted successfully, including all filesytems mounted and services running.
The server operated correctly again, running under Ubuntu 22.04, however with an older Kernel. This finally gave time for a proper analysis without "stress" during a (longer than expected) downtime.
A look at the installed Linux Kernel package showed something interesting:
root@jammy:~# dpkg -l|grep linux-image-generic
root@jammy:~#
There was none!
Well, actually there were some:
root@jammy:~# dpkg -l|grep linux-image
ii linux-image-4.15.0-158-generic 4.15.0-158.166 amd64 Signed kernel image generic
rc linux-image-4.4.0-104-generic 4.4.0-104.127 amd64 Linux kernel image for version 4.4.0 on 64 bit x86 SMP
rc linux-image-4.4.0-130-generic 4.4.0-130.156 amd64 Linux kernel image for version 4.4.0 on 64 bit x86 SMP
rc linux-image-4.4.0-141-generic 4.4.0-141.167 amd64 Linux kernel image for version 4.4.0 on 64 bit x86 SMP
rc linux-image-4.4.0-166-generic 4.4.0-166.195 amd64 Signed kernel image generic
rc linux-image-4.4.0-179-generic 4.4.0-179.209 amd64 Signed kernel image generic
rc linux-image-4.4.0-210-generic 4.4.0-210.242 amd64 Signed kernel image generic
rc linux-image-4.4.0-31-generic 4.4.0-31.50 amd64 Linux kernel image for version 4.4.0 on 64 bit x86 SMP
rc linux-image-4.4.0-83-generic 4.4.0-83.106 amd64 Linux kernel image for version 4.4.0 on 64 bit x86 SMP
rc linux-image-5.4.0-88-generic 5.4.0-88.99 amd64 Signed kernel image generic
rc linux-image-5.4.0-91-generic 5.4.0-91.102 amd64 Signed kernel image generic
rc linux-image-extra-4.4.0-104-generic 4.4.0-104.127 amd64 Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
rc linux-image-extra-4.4.0-130-generic 4.4.0-130.156 amd64 Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
rc linux-image-extra-4.4.0-141-generic 4.4.0-141.167 amd64 Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
rc linux-image-extra-4.4.0-31-generic 4.4.0-31.50 amd64 Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
rc linux-image-extra-4.4.0-83-generic 4.4.0-83.106 amd64 Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
rc linux-image-unsigned-4.4.0-168-generic 4.4.0-168.197 amd64 Linux kernel image for version 4.4.0 on 64 bit x86 SMP
A couple of previously installed Kernel versions showed up, including the 5.4.0-88 version which refused to boot. However the linux-image-generic meta package was clearly missing on this system.
It's the linux-image-generic (or a similar) Kernel meta package which decides which Kernel should be installed on this Ubuntu system. During a distribution upgrade, the linux-image-generic meta package is also upgraded - and this would trigger the installation of a newer Kernel (5.15.0 for Ubuntu 22.04). Would... as long as the Kernel meta package is installed!
Now the remaining question: Why would the Kernel meta package linux-image-generic be missing on this particular system?
A search through the APT history logs showed the hard truth (from the past):
root@jammy:~# zcat /var/log/apt/history.log.6.gz
[...]
Start-Date: 2021-12-14 17:12:08
Commandline: apt-get remove linux-image-5.4.0-91-generic
Remove: linux-image-generic:amd64 (5.4.0.91.95), linux-image-5.4.0-91-generic:amd64 (5.4.0-91.102), linux-modules-extra-5.4.0-91-generic:amd64 (5.4.0-91.102), linux-generic:amd64 (5.4.0.91.95)
End-Date: 2021-12-14 17:14:31
[...]
Shakira's hips don't lie. On Linux the same rule applies to logs.
Some admin manually removed a specific Kernel package in December 2021 (!!!). This not only removed the selected Kernel version, but also the meta package (linux-image-generic). Hence the missing package, hence the missing new Kernel after the distribution upgrade.
Mystery solved!
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder